OC-IA-P7 - Azure Text Analytics API for sentiment detection

Extract data and get a balanced sample

We'll only have positive or negative sentiments for these tweets.

Request API

The possible true values for a sentiment in our dataset are 0 (negative) and 4 (positive), while for the Azure API there are 4 possible values : negative, neutral, positive and mixed. For homogeneity we'll assimilate neutral and mixed to score 2.

This is not a good approach since we cannot compare the API results with the ground truth values. How can we convert the neutral / mixed values to positive / negative ones? Let's have a closer look to the API outputs on a sample text:

This is an extreme example but the idea is to rely on the confidence scores instead of the sentiment returned.

To do so, we are going to:

Since the API is limited by the number of processed text records, and not the number of calls to API, the function is modified to call the API for one text at a time.

Improve prediction by adding a classification

Choice of metric

Here we have to stop and think about our aim. If we think in terms of positive reviews and negative reviews, in our case, the false positive reviews are of greater consequence that the false negative:

In fact, what we are really interested in are the negative reviews : we want to detect "bad buzz".

So we'll here make an important decision: let's call "positive case" the case when the review is scored as negative. Why? Because all the metrics are computed on true positive, false positive, false negative, but never true negative. So if our aim is to detect real negative reviews, it makes sense to consider it as our positive class.

This leads us to consider the false positive and false negative cases in a new way:

Let's write it again so that it is perfectly clear: from now on, our positive label will be the label "0" that corresponds to negative sentiment, dissatisfaction. And our negative label will be of course the label 4.

In terms of precision and recall, this means that we are more interested in having a high recall (avoid false negative) than in having a high precision (avoid false positive).

But we cannot focus only on recall: if we consider all our customers as dissatisfied, we may end up trying to fix an imaginary dissatisfaction, that may lead us to useless costs (proposing vouchers to the customer for instance) or creating a "ridiculous buzz" on social networks by apologizing on good reviews.

To make a tradeoff between the precision and recall, we'll use the $F\beta$ score to put more weight on false negatives:

$F\beta = \frac{(1+\beta^2) TP}{(1+\beta^2)TP + \beta^2FN + FP}$

So here we'll use $\beta=2$ to focus on false negative.

Dummy classifier

Logistic regression

Since the data don't seem to be linearly separable, let's try other methods.

Decision Tree

Random Forest Classifier

And with a Gradient Boosting Classifier?

Gradient Boosting

SVC

We did not fine-tune hyperparameters nor cross-validate them. But since our aim is just to have an idea of the performance we may expect from a simple model (an since the default sklearn parameters have been carefully selected to be the best in many cases), we can use our scores as benchmarks for comparison with the performances of a neural network.